Feature Selection for Clustering by Exploring Nearest and Farthest Neighbors
نویسنده
چکیده
Feature selection has been explored extensively for use in several real-world applications. In this paper, we propose a new method to select a salient subset of features from unlabeled data, and the selected features are then adaptively used to identify natural clusters in the cluster analysis. Unlike previous methods that select salient features for clustering, our method does not require a predetermined clustering algorithm to identify salient features, and our method potentially ignores noisy features, allowing improved identification of salient features. Our feature selection method is motivated by a basic characteristic of clustering: a data instance usually belongs to the same cluster as its geometrically nearest neighbors and belongs to a cluster different than those of its geometrically farthest neighbors. In particular, our method uses instance-based learning to quantify features in the context of the nearest and the farthest neighbors of every instance so that clusters generated by the salient features maintain this characteristic. Keywords-feature selection; nearest neighbor; farthest neighbor; salient feature; cluster analysis.
منابع مشابه
Feature selection for clustering using instance-based learning by exploring the nearest and farthest neighbors
متن کامل
A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAlgorithm For Identifying Relevant Features Using Fast Clustering
In the high dimensional data set having features selection involves identifying a subset of the most useful features that produce compatible results as the original entire set of features. A fast algorithm may be evaluated from both the ability concerns the time required to find a subset of features and the value is required to the quality of the subset of features. Fast clustering based featur...
متن کاملNearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
We present a non-linear, simple, yet effective, feature subset selection method for regression and use it in analyzing cortical neural activity. Our algorithm involves a feature-weighted version of the k-nearest-neighbor algorithm. It is able to capture complex dependency of the target function on its input and makes use of the leave-one-out error as a natural regularization. We explain the cha...
متن کامل